Problem: The ops lead from a customer’s staging farm who had just recently configured a brand new Proxmox cluster contacted me with a request for thirty similar Ubuntu-based containers to be made available by lunchtime. Going through the GUI of Proxmox and configuring each container by manually entering information such as hostnames, usernames and SSH keys would not only take half the day but if there was any typo in any of the hostnames would require tearing down the container you created, starting again.
Constraints: I had no choice except to create light weight LXC containers since we would use all the system RAM. Furthermore, the customer wanted Ubuntu 24.04 LTS with zero configuration drift. The overhead of using full VMs was not necessary since we could not compromise on either of the two items above, therefore we created multiple identical containers with exactly the same secure and hardened SSH configuration, the same monitoring agent, and a single known set of users, without performing any manual steps after creation.
Solution: I built a custom LXC Template using the official Ubuntu Cloud Image and integrated the Provisioning using Proxmox’s Snippet Based User Data Cloud Init facility, causing me to be able to provision my entire request within approximately four minutes. That first deployment was a seamless installation, I felt like I was cheating because since that time I have never manually provisioned any container again.
Quick Summary
- Create an LXC template from the official Ubuntu Cloud Image, then cache the LXC template in our
/var/lib/vz/template/cache. - Create a cloud-init user-data snippet that will provision a hostname, SSH key, packages and runcmd directives during the first boot of each of the containers. You will need to insert this code snippet into a Proxmox Container Config using the
cicustom, and you can let Cloud-Init do the rest! - Once everything has been setup how you want it, you can add some further automation by using Terraform with the Proxmox Provider and Ansible for your Day 2 config.
Testing Info: Proxmox VE 8.2 (1 node, Dell R730xd), Ubuntu 24.04 LTS cloud image (ubuntu-24.04-cloudimg-amd64-root.tar.xz), Cloud-Init 24.1, Terraform 1.8 using the telmate/proxmox Provider, Ansible 9.0.
Prerequisites & Planning: Mapping the Automation Stack
Before you run any commands from your choice of terminal/CLI, you should first decide your tool stack and your source for the image.
The idea is to create a repeatable process from “I need a container” to a fully configured system without having the need to run interactive pct commands.
Tools
- A Proxmox server. The Ubuntu cloud image already includes cloud-init, so you don’t need to install it on the host. (If you’re building a custom template from scratch, just make sure cloud-init is installed inside it.)
- The rootfs tar.xz of the Ubuntu cloud image from the Ubuntu Cloud Images download page. I always take the
-root.tar.xzversion since Proxmox is expecting a standard root file-system instead of a raw disk. - An area to store your snippet files at the
/var/lib/vz/snippets/—this is where Proxmox will look for yourcicustomfiles.
Image Decision: You can use the -disk.img qcow2 image; however, extracting the rootfs from it is a manual process (see the pitfalls section). The -root.tar.xz is the easiest path to take and the one that is understood natively by the Proxmox pct create.
Cloud-Init Data Source: Proxmox LXC Containers utilize the NoCloud data source; this data source reads user-data and meta-data from the attached cicustom snippet. You don’t need a separate metadata service—Proxmox injects the snippet content straight into /var/lib/cloud/seed/nocloud/.
Crafting the Cloud-Init User-Data for Automated Configuration
The cloud-init user-data snippet is simply a YAML file containing the information that cloud-init consumes when it boots for the first time. I keep my user-data snippets in the file located at /var/lib/vz/snippets/cloud-init-userdata.yml, and I have Proxmox automatically insert that snippet into cicustom during the provisioning process.
#cloud-config
hostname: web-node-01
manage_etc_hosts: true
users:
- name: deploy
gecos: Deployment Account
sudo: ALL=(ALL) NOPASSWD:ALL
lock_passwd: true
ssh_authorized_keys:
- ssh-ed25519 AAAAC3... deploy@workstation
package_update: true
packages:
- htop
- curl
- qemu-guest-agent
runcmd:
- [ systemctl, enable, --now, qemu-guest-agent ]
- echo "Provisioning complete at $(date)" >> /var/log/provision.log
The first line, #cloud-config, is required for cloud-init because without it, cloud-init treats the content in the file as just raw user-data and will skip executing any of the configuration modules. I try to keep the snippets as short as possible, so there is less chance of Terraform having difficulty during IAM provisions as they scale up.
How it works. The document at cloud‑init NoCloud datasource documentation provides additional details, but the summary is that, when cloud-init receives the information from the seed directory, it will execute the same modules that are available on AWS or OpenStack instances. This means that hostname executes /etc/hostname, prepares inputs for the config module users, and provides access to the user-specified SSH keys, etc., all without requiring any manual sign-in to the instance.
Verify once you’ve completed the boot process: You will need SSH access to the container, so please log into the container after it has booted. Once you are SSHed in, use cloud-init status --wait and check to see if it shows status: done.
Building a Custom LXC Template from an Ubuntu Cloud Image
The first step I follow is to download the required rootfs archive into the Proxmox template cache.
root@pve:~# wget -P /var/lib/vz/template/cache/ \
https://cloud-images.ubuntu.com/noble/current/ubuntu-24.04-cloudimg-amd64-root.tar.xz
Then, I create the CT to be used as the base template for subsequent user-created CTs. In my case, the ID for my template CT is 9000, which I designate for CTs that will be used as base templates. I set --unprivileged 1 for security and --features nesting=1 because later I might need nested container support (like running Docker). The cgroup2 permissions that cloud-init requires are a separate fix we’ll handle in the pitfalls section.
root@pve:~# pct create 9000 /var/lib/vz/template/cache/ubuntu-24.04-cloudimg-amd64-root.tar.xz \
--hostname cloud-template \
--storage local-lvm \
--rootfs local-lvm:8 \
--memory 512 \
--cores 1 \
--net0 name=eth0,bridge=vmbr0,ip=dhcp \
--unprivileged 1 \
--features nesting=1 \
--description "Base LXC template with cloud-init"
The above command to pct pulls the image from the cache and assigns a top-level of resources to the container while adding a network interface. At this point, the container image is not yet running; it is simply a configuration file stored on disk.
To confirm if the template was created correctly: pct list should show a new entry of CT 9000 along with the expected State of stopped.
root@pve:~# pct list
VMID Status Lock Name
9000 stopped cloud-template
Injecting Cloud-Init Configuration and First Boot
Before you modify the configuration file you should save a copy of it:
root@pve:~# cp /etc/pve/lxc/9000.conf /etc/pve/lxc/9000.conf.bak
It would be wise to keep another root shell open so if you run into issues with the configuration file, you have the ability to rollback to the previous version immediately.
Next add the following line to the bottom of the /etc/pve/lxc/9000.conf:
cicustom: user=local:snippets/cloud-init-userdata.yml
After you have added the above entry, Proxmox will automatically look inside /var/lib/vz/snippets/ (which is an alias for the local) for the provided file and place it in the correct location in the NoCloud Seed Directory. Next, you may start your new Container and check the Console Output while Cloud-Init runs through it’s processes.
root@pve:~# pct start 9000
root@pve:~# pct console 9000
[ OK ] Finished Wait until cloud-init is done.
Ubuntu 24.04 LTS cloud-template tty1
cloud-template login:
After starting your Container with Cloud-Init, SSH into the Container and check the Provisioning Log to see how the Container was Provisioned:
deploy@cloud-template:~$ tail -1 /var/log/provision.log
Provisioning complete at Wed Sep 15 10:25:33 UTC 2024
deploy@cloud-template:~$ cloud-init status --wait
status: done
At this point, your new Container is now a fully provisioned Cloud Init Node. At this point, I generally stop the Container CT 9000 and mark it as a Template so that I can quickly clone it in the future.
Optimization: Integrating Infrastructure as Code Tools
Automating with Terraform’s Proxmox Provider
I utilize the telmate/proxmox Provider to create and define Containers along with associated Cloud Init Snippets. The following is a basic reference resource for the manual configuration above.
resource "proxmox_lxc" "web" {
target_node = "pve"
hostname = "web-node-01"
ostemplate = "local:vztmpl/ubuntu-24.04-cloudimg-amd64-root.tar.xz"
unprivileged = true
rootfs {
storage = "local-lvm"
size = "8G"
}
network {
name = "eth0"
bridge = "vmbr0"
ip = "dhcp"
}
cicustom = "user=local:snippets/cloud-init-userdata.yml"
}
The Terraform Proxmox provider/resource allows for the reconstruction of the container within seconds if the snippet has changed, thus resulting in a complete lack of required teardown manual intervention.
Post-Provisioning Configuration with Ansible Container Deployment
Once cloud-init has completed, SSH access can be used to deploy Ansible playbooks. Based on the type of environment, I frequently use dynamic inventory to query the Proxmox API; however, for smaller environments, a basic static inventory will work well.
- name: Harden SSH and install monitoring
hosts: container
become: true
tasks:
- name: Disable password authentication
ansible.builtin.lineinfile:
path: /etc/ssh/sshd_config
regexp: '^#?PasswordAuthentication'
line: "PasswordAuthentication no"
state: present
notify: restart sshd
- name: Install node_exporter
ansible.builtin.apt:
name: prometheus-node-exporter
state: latest
Service check: Run systemctl is-active prometheus-node-exporter on the target container—it should return active.
Real-World Pitfalls: Hard Lessons from Production
My Setup Headaches
The Silent Network Failure: lxc.cgroup2 Denied cloud‑init’s Network Config
I deployed a new unprivileged container, gave it a static IP via user-data, and fired it up. After boot, I tried SSH—nothing. The console showed only the loopback interface was up, and cloud-init’s cc_netinfo module had failed.
After obtaining the log files, the cloud-init log showed the following:
2024-09-15 10:23:01,123 - stages.py[WARNING]: Failed to apply network config: No valid network interface found
...
2024-09-15 10:23:01,789 - handlers.py[ERROR]: kernel denied access to cgroup /sys/fs/cgroup/... <--
The No valid network interface found message tells you cloud-init couldn’t create or configure eth0 because the container’s cgroup2 profile blocked the operation.
I dug into the Proxmox forum thread on lxc.cgroup2 and cloud‑init and found that Proxmox’s unprivileged container profile is restrictive by default. cloud‑init needs access to the network namespace and the tun device to apply netplan changes. The fix is to whitelist the required devices.
Add these lines to the container config (/etc/pve/lxc/<ID>.conf) before the first boot:
lxc.cgroup2.devices.allow: c 10:200 rwm
lxc.cgroup2.devices.allow: c 136:* rwm
The first line allows the tun device (major 10, minor 200), and the second covers pts devices that cloud‑init sometimes touches. After a stop/start cycle, cloud‑init applied the network config without a whimper.
Testing the Fix: You can test this fix by running the ip link show eth0 within your container. You should see the interface up and carrying an IP address.
Edge Case: When Cloud-Init Won’t Re‑run on Restart
I had a situation with a container where the user‑data snippet contained a typo within the runcmd. cloud-init was able to gracefully handle the situation, but after correcting the typo and restarting the container, cloud-init would not run again since the instance was already marked as “done”.
You can force cloud-init to reset its state by executing the following command, from either Proxmox host’s pct enter or directly within the container:
root@container:~# cloud-init clean --logs
root@container:~# reboot
The clean --logs removes the internal marker files, so cloud‑init re‑executes all modules on next boot. No need to rebuild the container from scratch.
Common Mistake: Using a Raw qcow2 Image Without Template Conversion
It is quite common for beginners to download ubuntu-24.04-cloudimg-amd64-disk.img and attempt to use it as an ostemplate for the pct create application. This will fail because pct expects a tar archive of the root filesystem, and it does not recognize a qcow2 raw disk image (with partition information).
If all you have is the qcow2 image, then you will need to extract the root filesystem and repackage it as a tar.xz file. Below is the command I utilize to achieve this quickly:
root@pve:~# modprobe nbd max_part=8
root@pve:~# qemu-nbd --connect=/dev/nbd0 ubuntu-24.04-cloudimg-amd64-disk.img
root@pve:~# mount /dev/nbd0p1 /mnt
root@pve:~# cd /mnt && tar -cJf /var/lib/vz/template/cache/ubuntu-24.04-custom.tar.xz .
root@pve:~# umount /mnt && qemu-nbd --disconnect /dev/nbd0
Once you have cached the tar.xz file, the pct create application should work the same way as it did before.
Frequently Asked Questions
Can I Use Cloud-Init with Unprivileged LXC Containers on Proxmox?
Certainly. In fact, I utilize unprivileged LXC containers exclusively for security reasons. The only additional modification you need to make for cloud-init to set up the network is to relax the lxc.cgroup2 restrictions. The method of cicustom injection will remain the same as for privileged LXC containers.
Why Does My Ubuntu Cloud Image Container Fail to Boot After Applying User-Data?
Most often, it is due to either an improperlyformed snippet or a missing #cloud-config header. To correct this issue, please check the syntax of your snippet using cloud-init schema --system within your container. If your network configuration fails, review the cgroup denial I mentioned previously, which will show a kernel denied access error.
How Can I Automate Initial LXC Provisioning Using Terraform and Cloud-Init?
Create a proxmox_lxc resource declaration within your Terraform code that includes the cicustom argument pointing to your snippet. Terraform will create a new container each time your snippet changes, thus making provisioning idempotent. Use Ansible post-boot to update and harden the security of your new container for a full GitOps pipeline.